33 research outputs found

    Community detection algorithms: a comparative analysis

    Full text link
    Uncovering the community structure exhibited by real networks is a crucial step towards an understanding of complex systems that goes beyond the local organization of their constituents. Many algorithms have been proposed so far, but none of them has been subjected to strict tests to evaluate their performance. Most of the sporadic tests performed so far involved small networks with known community structure and/or artificial graphs with a simplified structure, which is very uncommon in real systems. Here we test several methods against a recently introduced class of benchmark graphs, with heterogeneous distributions of degree and community size. The methods are also tested against the benchmark by Girvan and Newman and on random graphs. As a result of our analysis, three recent algorithms introduced by Rosvall and Bergstrom, Blondel et al. and Ronhovde and Nussinov, respectively, have an excellent performance, with the additional advantage of low computational complexity, which enables one to analyze large systems.Comment: 12 pages, 8 figures. The software to compute the values of our general normalized mutual information is available at http://santo.fortunato.googlepages.com/inthepress

    Robustness of journal rankings by network flows with different amounts of memory

    Full text link
    As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions with influence from journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network between a selection of journals and unavoidably depend on this selection. However, little is known about how robust rankings are to the selection of included journals. Here we compare the robustness of three journal rankings based on network flows induced on citation networks. They model pathways of researchers navigating scholarly literature, stepping between journals and remembering their previous steps to different degree: zero-step memory as impact factor, one-step memory as Eigenfactor, and two-step memory, corresponding to zero-, first-, and second-order Markov models of citation flow between journals. We conclude that higher-order Markov models perform better and are more robust to the selection of journals. Whereas our analysis indicates that higher-order models perform better, the performance gain for the second-order Markov model comes at the cost of requiring more citation data over a longer time period.Comment: 9 pages, 5 figure

    Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems

    Full text link
    Unveiling the community structure of networks is a powerful methodology to comprehend interconnected systems across the social and natural sciences. To identify different types of functional modules in interaction data aggregated in a single network layer, researchers have developed many powerful methods. For example, flow-based methods have proven useful for identifying modular dynamics in weighted and directed networks that capture constraints on flow in the systems they represent. However, many networked systems consist of agents or components that exhibit multiple layers of interactions. Inevitably, representing this intricate network of networks as a single aggregated network leads to information loss and may obscure the actual organization. Here we propose a method based on compression of network flows that can identify modular flows in non-aggregated multilayer networks. Our numerical experiments on synthetic networks show that the method can accurately identify modules that cannot be identified in aggregated networks or by analyzing the layers separately. We capitalize on our findings and reveal the community structure of two multilayer collaboration networks: scientists affiliated to the Pierre Auger Observatory and scientists publishing works on networks on the arXiv. Compared to conventional aggregated methods, the multilayer method reveals smaller modules with more overlap that better capture the actual organization

    Mapping bilateral information interests using the activity of Wikipedia editors

    Full text link
    We live in a global village where electronic communication has eliminated the geographical barriers of information exchange. The road is now open to worldwide convergence of information interests, shared values, and understanding. Nevertheless, interests still vary between countries around the world. This raises important questions about what today's world map of in- formation interests actually looks like and what factors cause the barriers of information exchange between countries. To quantitatively construct a world map of information interests, we devise a scalable statistical model that identifies countries with similar information interests and measures the countries' bilateral similarities. From the similarities we connect countries in a global network and find that countries can be mapped into 18 clusters with similar information interests. Through regression we find that language and religion best explain the strength of the bilateral ties and formation of clusters. Our findings provide a quantitative basis for further studies to better understand the complex interplay between shared interests and conflict on a global scale. The methodology can also be extended to track changes over time and capture important trends in global information exchange.Comment: 11 pages, 3 figures in Palgrave Communications 1 (2015

    Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities

    Full text link
    Many complex networks display a mesoscopic structure with groups of nodes sharing many links with the other nodes in their group and comparatively few with nodes of different groups. This feature is known as community structure and encodes precious information about the organization and the function of the nodes. Many algorithms have been proposed but it is not yet clear how they should be tested. Recently we have proposed a general class of undirected and unweighted benchmark graphs, with heterogenous distributions of node degree and community size. An increasing attention has been recently devoted to develop algorithms able to consider the direction and the weight of the links, which require suitable benchmark graphs for testing. In this paper we extend the basic ideas behind our previous benchmark to generate directed and weighted networks with built-in community structure. We also consider the possibility that nodes belong to more communities, a feature occurring in real systems, like, e. g., social networks. As a practical application, we show how modularity optimization performs on our new benchmark.Comment: 9 pages, 13 figures. Final version published in Physical Review E. The code to create the benchmark graphs can be freely downloaded from http://santo.fortunato.googlepages.com/inthepress

    A high-reproducibility and high-accuracy method for automated topic classification

    Full text link
    Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results which are not accurate in inferring the most suitable model parameters. Adapting approaches for community detection in networks, we propose a new algorithm which displays high-reproducibility and high-accuracy, and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure. Our algorithm promises to make "big data" text analysis systems more reliable.Comment: 23 pages, 24 figure

    Finding Statistically Significant Communities in Networks

    Get PDF
    Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks

    Combinatorial approach to Modularity

    Full text link
    Communities are clusters of nodes with a higher than average density of internal connections. Their detection is of great relevance to better understand the structure and hierarchies present in a network. Modularity has become a standard tool in the area of community detection, providing at the same time a way to evaluate partitions and, by maximizing it, a method to find communities. In this work, we study the modularity from a combinatorial point of view. Our analysis (as the modularity definition) relies on the use of the configurational model, a technique that given a graph produces a series of randomized copies keeping the degree sequence invariant. We develop an approach that enumerates the null model partitions and can be used to calculate the probability distribution function of the modularity. Our theory allows for a deep inquiry of several interesting features characterizing modularity such as its resolution limit and the statistics of the partitions that maximize it. Additionally, the study of the probability of extremes of the modularity in the random graph partitions opens the way for a definition of the statistical significance of network partitions.Comment: 8 pages, 4 figure
    corecore